The Design of a Nearest-Neighbor Classi er and Its Use for Japanese Character Recognition

نویسندگان

  • Tao Hong
  • Stephen W. Lam
  • Jonathan J. Hull
  • Sargur N. Srihari
چکیده

The nearest neighbor (NN) approach is a powerful nonparametric technique for pattern classi cation tasks. Although the brute-force NN algorithm is simple and has high accuracy, its computation cost is usually very expensive, especially for applications such as Japanese character recognition in which the number of categories is large. Many methods have been proposed to improve the efciency of NN classi ers by reducing the number of prototypes and speeding up NN search. In this paper, algorithms for prototype reduction, hierarchical prototype organization and fast NN search are described. To remove redundant category prototypes and to avoid redundant comparisons, the algorithms exploit geometrical information of a given prototype set which is represented approximately by computing k-nearest/farthest neighbors of each prototype. The performance of a NN classi er using those algorithms for Japanese character recognition is reported. Given a large Japanese character training set, only a small portion of samples in the set are selected as prototypes. The fast NN search algorithm works as accurately as the straightforward algorithm while the average number of comparisons is about two third of that in the straightforward algorithm. The average number of comparisons is further reduced to less than one third of total number of prototypes if prototypes are organized hierarchically.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cherry Blossom: A System for Japanese Character Recognition

A general purpose Japanese character recognition system, Cherry Blossom, has been developed at CEDAR in past years. It is designed to recognize Japanese document images in low resolution or with poor print quality. The system includes modules for page skew correction, document segmentation, text segmentation, character recognition and postprocessing. The API code for each module has been develo...

متن کامل

The design of a nearest-neighbor classifier and its use for Japanese character recognition

The nearest neighbor (NN) approach is a powerfd nonparametric technique for pattern classification tasks. In this paper, algorithms for prototype reduction, hierarchical prototype organization and fast NN search are described. To remove redundant category prototypes and to avoid redundant comparisons, the algorithms exploit geometrical information of a given prototype set which is represented a...

متن کامل

Symbolic Nearest Mean Classiiers

Piew Datta and Dennis Kibler Department of Information and Computer Science University of California Irvine, CA 92717 fpdatta, [email protected] Abstract The minimum-distance classi er summarizes each class with a prototype and then uses a nearest neighbor approach for classi cation. Three drawbacks of the minimum-distance classi er are its inability to work with symbolic attributes, weigh at...

متن کامل

Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten

Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...

متن کامل

Presentation of K Nearest Neighbor Gaussian Interpolation and comparing it with Fuzzy Interpolation in Speech Recognition

Hidden Markov Model is a popular statisical method that is used in continious and discrete speech recognition. The probability density function of observation vectors in each state is estimated with discrete density or continious density modeling. The performance (in correct word recognition rate) of continious density is higher than discrete density HMM, but its computation complexity is very ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995